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Abstract 

Phylogenetic trees and networks are leaf-labelled graphs that are used to describe evolutionary histories of 
species. The Tree Containment problem asks whether a given phylogenetic tree is embedded in a given 
phylogenetic network. Given a phylogenetic network and a cluster of species, the Cluster Containment 
problem asks whether the given cluster is a cluster of some phylogenetic tree embedded in the network. 
Both problems are known to be NP-complete in general. In this article, we consider the restriction of 
these problems to several well-studied classes of phylogenetic networks. We show that Tree Containment 
is polynomial-time solvable for normal networks, for binary tree-child networks, and for level-A; networks. 
On the other hand, we show that, even for tree-sibling, time-consistent, regular networks, both Tree 
Containment and Cluster Containment remain NP-complete. 
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1. Introduction 

Rooted trees, and more generally digraphs, are widely used to represent evolutionary relationships in 
biology [H in [n]. Such a digraph typically has a set X of labelled leaves (vertices of outdegree 0) which 
corresponds to the collection of present-day species under study. The arcs in the digraph are directed away 
from a single 'root' vertex which represents the evolutionary ancestor of the species in X. The remaining 
vertices of the digraph are usually unlabelled; they represent hypothetical ancestral species, and an arc 
from u to V indicates that ancestral species u contributes directly to the genetic makeup of v. 

The simplest phylogenetic networks are trees, and these have traditionally been used when evolution is 
described purely by the formation of each new species from an existing one. However, processes of reticulate 
evolution (in particular the formation of hybrid species, and lateral gene transfer) mean that a new species 
can have genetic contributions from more than one ancestral species, which is the reason digraphs are 
increasingly seen as a desirable model in molecular systematics [S]. More precisely, if one considers the 
evolution of a particular gene, its evolution will generally be described by a tree, but reticulate processes 
mean that different genes can have different evolutionary tree structures, that can only be adequately 
reconciled by fitting them into a phylogenetic network. 

This raises a fundamental question in computational phylogenetics: given a tree and a phylogenetic 
network, both with leaf set X , is there an efficient algorithm to determine whether the tree 'fits inside' the 
network (in a sense we define precisely below)? In general this question is NP-hard, even for certain restricted 
classes of phylogenetic networks. However, we show that, for particular classes, there exist polynomial-time 
algorithms which also provide a recipe for finding an explicit embedding of the tree in the phylogenetic 
network. In addition, we consider the computational complexity of deciding whether a subset of A" is a 
cluster of some tree that sits inside a given network. 
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The structure of this article is as fohows. After giving formal definitions in the next section, we present our 
polynomial-time algorithms for the 'tree containment' problem in SectionjSj discuss the 'cluster containment' 
problem in Section |4] show the computational intractability of both problems for certain classes of networks 
in Section [5j and finish with an open problem in Section [6j 

2. Definitions 

Consider a set X of taxa. A rooted phylogenetic network (network for short) on is a directed acyclic 
(simple) graph with a single root (vertex with indegree 0), leaves (vertices with outdegree 0) bijectively 
labeled by X and no vertices with indegree and outdegree one. We identify each leaf with its label and refer 
to the directed edges (arcs) as edges. 

A vertex is called a reticulation vertex [reticulation for short) if it has indegree at least two and a tree- 
vertex otherwise. An edge is called a tree-edge if it ends in a tree-vertex and it is called a reticulation- edge 
otherwise. A tree-path is a directed path that contains only tree-edges. A network with no reticulations is 
said to be a rooted phylogenetic tree (tree for short). A network is said to be binary if each reticulation has 
indegree two and outdegree one and all of the vertices have outdegree at most two. 

Given two vertices Vi, V2 (of some tree or network), we use Vi < V2 to denote that there is a directed path 
from vi to V2- If vi < V2 and vi ^ V2, we write vi < V2- For a vertex v, we define cl{v) = {x G X : v < x}. 
If {u,v) is an edge, then we say that u is a parent of v and w is a child of u. If two vertices vi,V2 have a 
common parent, then they are said to be siblings. An edge e of a network iV is a cut-edge if removal of e 
disconnects the undirected graph underlying N. A cut-edge is trivial if it ends in a leaf. 

Given a tree T and network N, we say that N displays T if there is a subgraph T' of N that is a 
subdivision of T (i.e. T' can be obtained from T by replacing edges by directed paths). For a tree T on X 
and a subset X' C X, we use T\X' to denote the tree on X' that is displayed by T. 

These concepts are illustrated in Fig. [T] 
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Figure 1: Left: A rooted phylogenetie network on X = {a, 6, c,d, e}. This network is normal, but not binary and has one 
reticulation vertex. Middle: One of the two rooted phylogenetic trees on X = {a, b, c, d, e} that are displayed by this network. 
Right: If we add either of the dashed edges, the resulting network is no longer normal; adding the steeper, left-most arrow 
violates condition {R3) while adding the right-most arrow results in a network that violates the tree-child property for the 
parent of b and c. 

In the first part of this paper we consider the following fundamental decision problem: 
Tree Containment 

Instance: A set X of taxa, a rooted phylogenetic network TV on A" and a rooted phylogenetic tree T 
on X . 

Question: Does N display T? 

This problem is known to be NP-complete [5] for the general class of rooted, binary phylogenetic networks. 
Here we study the restriction of this problem to some established classes of rooted phylogenetic networks. 
We then consider the complexity of a related 'cluster containment' problem. 

2.1. Classes of Phylogenetic Networks 

A phylogenetic network N is said to be regular if for any two distinct vertices u,v of N 

(Rl) cl{u) ^ cl{v). 
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(R2) u < u if and only if cl{u) D cl{v); 

(R3) there is no edge {u, v) if there is also a directed path from u to w of length greater than one. 

A phylogenetic network TV is said to be tree-child (see e.g. [4J) if each vertex of N cither is a leaf or 
has a child that is a tree-vertex. It follows immediately that from each vertex of a tree-child network there 
exists a tree-path to a leaf. A network is said to be normal if it is tree-child, has no vertices of outdegree 
1, and in addition condition (R3) above holds. It has been shown that any normal network is regular, i.e. 
that automatically conditions (Rl) and (R2) hold [13]. 

3. Polynomial-Time Algorithms 

This section describes our main results, polynomial-time algorithms for Tree Containment restricted 
to normal networks and to binary tree-child networks. We first give an algorithm for normal networks in 
Section |3.1[ because this algorithm is simpler and will be used as a subroutine for the algorithm for binary 
tree-child networks in Section [X2] 

3.1. Normal Networks 

We show that, given a normal phylogenetic network N and a phylogenetic tree T, one can decide in 
polynomial time whether N displays T. We propose the following algorithm. An example is in Figure [2j 

Algorithm LocateNormal 

For each reticulation r of N, do the following. 

1. Find a leaf Xr that can be reached from r by a tree-path. 

2. Construct a set X' consisting of a;,, and, for each parent p of r, a leaf that can be reached from p by a 
tree-path. 

3. If Xr does not have a sibling in T\X', report that N does not display T. 

4. If Xr does have a sibling in T\X' , let Xs be such a sibling and Ps the reticulation from which Xg can 
be reached by a tree-path. Delete all edges entering r except for {ps,r). 

When there are no reticulations left, the network has been transformed into a phylogenetic tree T' . Check 
if T' is a subdivision of T. If it is, report that N displays T. Otherwise, report that N does not display T. 

Theorem 1. Given a normal phylogenetic network N on X and a phylogenetic tree T on X , algorithm LOC- 
ATeNormal decides in polynomial time whether N displays T . Moreover, if N displays T, LocateNormal 
finds the unique subtree of N that is a subdivision of T . 

Proof. We first show that, for any reticulation r, the leaves in the set X' described by the algorithm (which 
exist by the tree-child property) are always distinct. To show this, assume that two such tree-paths end in 
the same leaf. Since tree-paths cannot combine in a reticulation, the only remaining possibility is that, for 
two parents Pa,Pb of r, the tree-path from pa to a leaf passes through pb. However, this implies the existence 
of a directed path pa ^ r oi length greater than one, contradicting (R3), since there is also an edge {pa,r). 

We now prove that LocateNormal reports that N displays T if and only if this is the case. If the 
algorithm reports that TV displays T, then it is clear that TV indeed displays T, since the algorithm checks if 
the constructed subtree T' of TV is a subdivision of T. Now suppose that TV displays T. Then there exists 
a subtree T' of TV that is a subdivision of T. Take any reticulation r and X' as in the algorithm. Observe 
that the root of T' is the root of TV by property (Rl). Thus, by this observation and the fact that there is a 
tree-path from r to a leaf, r and exactly one of the edges entering r are included in T' . Let (p' , r) be the edge 
entering r that is included in T' and x' the leaf in X' that can be reached from p' by a tree-path. Then x' is 
a sibling of Xr in T'\X' (which is isomorphic to T\X') and hence p' = Pg. It follows that LocateNormal 
deletes all reticulation-edges not included in T' , thus reconstructing T' . Moreover, since there is no choice 
which reticulation-edges to delete, the subtree T' is unique. 

The running-time is clearly polynomial in the size of the input. Moreover, the number of vertices of a 
normal network grows at most quadratically with the number of leaves |13j . Hence, the running-time is also 
polynomial in the number of leaves. □ 
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Figure 2: Illustration of algorithm LocateNormal for tree T in (a) and normal network N in (b). Dotted edges are removed 
from N\ in (b) because a and b are siblings in T|{a, b, c}, in (c) because c and d are siblings in T|{a, c, ci}, and in (d) because d 
and e are siblings in T|{d, e, /}. 



3.2. Tree-Child Networks 

This subsection shows that, given a binary tree-child phylogenetic network N and a phylogenetic tree T, 
one can decide in polynomial time whether N displays T. We partition the reticulations of N into four 
types. 

Type I. There is no tree-path between the parents of the reticulation. 
Type II. There is a singe edge connecting the parents of the reticulation. 

Type III. There is a tree-path between the parents of the reticulation that contains a vertex that is the 
parent of two tree-vertices. 

Type IV. There is a tree-path between the parents of the reticulation that contains at least one internal 
vertex but no vertex that is the parent of two tree-vertices. 

Note that in a normal network all reticulations are of Type I, because of restriction (R3). Thus, the 
difficulty in generalizing the algorithm for normal networks in Section [XT] is that the algorithm for tree-child 
networks below also has to take reticulations of Types II, III and (especially) IV into account. 

Algorithm LocateTreeChild 

Repeat the following four steps until none is applicable. 

Step 1. If there is a reticulation of Type I, proceed as for normal networks. 

Step 2. If there is a reticulation of Type II, choose one of its two incoming edges arbitrarily and remove it. 
Step 3. If there is a reticulation r of Type III, do the following. 

3.1 Let Pa and pb be the parents of r such that Pa < Pb- Pick a vertex v that lies on the path from pa 
to Pb and is the parent of two tree- vertices. Let c be the child of v that does not lie on the path 
between the parents of r. Let Xr, Xb and Xc be leaves that can be reached from r,pb and c respectively 
by tree-paths. 

3.2 If Xb and x^ are siblings in T|{xf,, x^, xd, remove the edge (pai^) from N. 

3.3 If Xb and Xc are siblings in T\{xb,Xr,Xc}, remove the edge {pb,r) from N. 

3.4 Otherwise, report that N does not display T. 

Step 4. If there is a reticulation of Type IV, do the following. 

4.1 If contains a nontrivial cut-edge {u, v), let C be the set of taxa reachable from v. Find a vertex v' 
of T with C = cl{v') (if there is no such vertex, report that N does not display T). Construct from T 
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by deleting all descendants of v' and labeling v' by a new taxon Xc ^ X . Construct Nc from N by 
deleting all descendants of v and labeling v by Xc. Construct iV|C by restricting N to the vertices and 
edges reachable from v. Run the algorithm recursively for {Nc,Tc) and {N\C,T\C). If Nc displays Tc 
and N\C displays T\C, report that N displays T. Otherwise, report that N does not display T. 

4.2 Contract all vertices with indegree and outdegree 1. Find a leaf x that has a reticulation as sibling. 
Construct a tree-path P as follows. Initialise P as a; and, as long as the first vertex ti of P has a 
reticulation as sibling, add the parent of v to P. Construct a set X' by including x and, for each 
reticulation that has a parent on P, a leaf that can be reached from this reticulation by a tree-path. 

4.3 If X does not have a sibling in T\X' , report that N does not display T. 

4.4 If X does have a sibling Xg in T\X', let r' be the reticulation from which Xs can be reached by a 
tree-path. Let p'^j and be the parents of r' such that p'^j < Remove the edge {p'^, r') from A^. 

When there are no more reticulations left, check if the resulting phylogenetic tree T' is a subdivision 
of T. If it is, report that N displays T. Otherwise, report that N does not display T. 

Theorem 2. Given a binary tree-child phylogenetic network N on X and a phylogenetic tree T on X , 
algorithm LocateTreeChild decides in polynomial time whether N displays T . 

Proof. If there exist reticulations of Type I or III, it can be shown analogously to the proof of Theorem [T] 
that Steps 1 and 3 of LocateTreeChild either resolve these reticulations in the only possible way, or 
conclude that N does not display T. It is furthermore clear that Step 2 correctly deals with reticulations of 
Type II (because N is binary) and Step 4.1 with nontrivial cut-edges. Thus, from now on, we assume that 
all reticulations are of Type IV and that there are no nontrivial cut-edges. 

We first show that a leaf x as described by Step 4.2 indeed exists, i.e. we show that there is a leaf that 
has a sibling which is a reticulation. Consider a tree-path Q of maximum length. Because Q is a tree-path, 
it ends in a tree- vertex x, which must be a leaf because N is tree-child and Q is of maximum length. The 
single parent of x is a tree-vertex because, if it were a reticulation, Q would consist of just x but, by the 
existence of a Type IV reticulation, there exists a tree-path of length at least two. Thus, leaf x has a sibling, 
which has to be a reticulation because otherwise it would be possible to extend Q to a longer tree-path. 
This shows the existence of a leaf x as described in Step 4.2. 

We now show that the algorithm reports that N displays T precisely when this is the case. The algorithm 
checks if the constructed subgraph T' is a subdivision of T, so it will clearly never report that N displays T if 
this is not the case. So assume that N does display T. It remains to show that the algorithm will construct 
some subdivision of T in N . Observe that, apart from x, all vertices on the path P have two children: one 
reticulation and one tree- vertex (which lies on P). Let R be the set of all reticulations with a parent on P. 
Thus, X' consists of x and, for each reticulation in P, a leaf that can be reached from this reticulation by a 
tree-path. By the choice of a;, the path P contains at least one other vertex, and so the set X' contains at 
least two leaves. 

We claim that all parents of reticulations in R lie on P. Assume that this is not the case. Then there 
is a reticulation f with two parents PajPh such that Pa < pt and one of Pa,Pb is not on P while the other 
one is. Since P and the path pa < Pb both contain only tree-edges, we must have that pa is on P while pb is 
not. However, then the path pa — ^ pt contains a vertex that is the parent of two tree-vertices, which means 
that f is of Type III. This is a contradiction because the algorithm has already resolved all reticulations of 
Type III in Step 3. 

Thus, apart from x, P consists precisely of the parents of reticulations in R. Since N displays T, there 
exists a subtree T' of N that is a subdivision of T. We next claim that x has a sibling in T'\X' . Let Vg 
be the last vertex on the path P for which the reticulation-edge leaving is included in T' (such a vertex 
exists because one of the edges entering the sibling of x is included in T'). Let Xg be the leaf in X' that can 
be reached from the reticulation-child of Vg by a tree-path. Clearly, Xg and x are siblings in T'\X' (which is 
isomorphic to T\X'). Thus, the algorithm removes the edge (pj,,?"') from N, with r' the reticulation-child 
of Vg and p'^ equal to either Vg or the other parent of r' . 

First suppose that p'^ is not equal to Vg. Thus, Vg = p'^, and pjj is the other parent of r'. In this case, the 
algorithm correctly removes the edge (p^,?"') that is not in T' and we are done. 
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Now assume that pj, = Vg and thus that the algorithm removes the edge {p'^, r') that is in T' . We replace 
in T' the edge (pj,, r') by the edge r'). The resulting subgraph of N is again a subdivision of T, because 
none of the reticulation-edges leaving vertices below p'^ = Vs were included in T' (by the choice of Vg). See 
Figure [3] for an illustration. We conclude that the algorithm correctly constructs a subdivision of T in N. 

The running-time is clearly polynomial in the size of the input. Moreover, the number of reticulations in 
a tree-child network is at most n = jA"! and it can be shown by induction on the number rir of reticulations 
that the number of edges of a binary tree-child network is at most 2n + Sn,. — 2 < 5n — 2. Hence, the 
running-time is also polynomial in the number of leaves. □ 




Figure 3: Illustration of the last case in the proof of Theorem [2] A subdivision of the tree T in the network is indicated by 
solid lines. Replacing edge by (p^,»'') gives another subdivision of T in A', because (Pa,'"') is the last edge leaving the 

path P (the vertical path) that is used by the subdivision. 



4. Cluster Containment 

A cluster is a strict subset of X. There are two different ways of seeing clusters in networks. Let w be a 
vertex of a network N on X. The hardwired cluster of v, denoted cl{v), is the set containing all taxa that 
can be reached from v. A cluster is said to be a hardwired cluster of N if it is the hardwired cluster of some 
vertex of N. A cluster C C X is said to be a softwired cluster of w if C equals the set of all taxa that can be 
reached from v when, for each reticulation r, exactly one incoming edge of r is "switched on" and the other 
incoming edges of r are "switched off" . Notice that each vertex has exactly one hardwired cluster and one 
or more softwired clusters. We say that C is a softwired cluster of a network N if it is a softwired cluster of 
some vertex of N. The softwired clusters of N can elegantly be characterized as follows. A cluster C C X 
is a softwired cluster of N if and only if there exists a tree T such that C is a hardwired cluster of T and T 
is displayed by N. We are interested in the following decision problem. 

Cluster Containment 

Instance: A set X of taxa, a rooted phylogenetic network N on X and a cluster C C X. 
Question: Is C a softwired cluster of A^? 

This problem is known to be NP-complcte [9 for general binary phylogenetic networks. However, it 
has a polynomial-time algorithmic solution if we restrict N to tree-child networks 7 . We will show below 
(Theorem [3| that, if we extend the class of normal networks to regular networks, the problem Cluster 
Containment becomes NP-hard, even if we add further structural restrictions to this class of networks. 
But first we describe yet another class of networks for which both problems -Tree Containment and 
Cluster Containment - have polynomial-time algorithms. 
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Level-k Networks 

We finish tiiis section by observing that both Tree Containment and Cluster Containment are 
polynomial-time solvable for the class of binary level-k networks. A binary network is biconnected if it 
contains no cut-edges. A biconnected subgraph i? of a binary network N is said to be a biconnected 
component if there is no biconnected subgraph B' ^ B oi N that contains B. A binary phylogenetic 
network is a level-k network if each biconnected component has at most k reticulations. 

Observation 1. Tree Containment and Cluster Containment are polynomial-time solvable when 
restricted to binary level-k networks, for any fixed k. 

Proof. We may assume that the network contains no nontrivial cut-edges, because Step 4.1 of algorithm 
LocateTreeChild can be applied until there are no nontrivial cut-edges left. A level-fc network with no 
nontrivial cut-edges contains at most k reticulations. Thus, we can loop through all 2^ ways of selecting 
one incoming edge for each of the k reticulations. For each of the resulting trees, we check whether it is 
a subdivision of the input tree (in case of the Tree Containment problem) or check whether the input 
cluster is a cluster of the obtained tree (in case of the Cluster Containment problem). □ 

5. NP-Completeness for Tree-Sibling, Time-Consistent, Regular Networks 

A phylogenetic network is said to be tree-sibling (see e.g. [3]) if each reticulation has a sibling that is 
a tree-vertex. A phylogenetic network N is said to be time- consistent (see e.g. [2l[T0]) if it is possible to 
assign each vertex v of N a, "time stamp" t{v) G R-° such that for each edge (w, v) of N: 

(TCl) t{u) < t{v) if {u,v) is a tree-edge and 
(TC2) t{u) — t{v) if {u,v) is a reticulation-edge. 

Theorem 3. Tree Containment and Cluster Containment are both NP-complete when restricted to 
tree-sibling, time-consistent, regular phylogenetic networks. 

Proof. We reduce from the Tree Containment and Cluster Containment problems on general net- 
works, which were shown to be NP-complete by Kanj et al. ,9J. 

Let A" be a set of taxa, and let N and T be a network and tree on X, respectively. We will modify N to 
a tree-sibling, time-consistent, regular network N' on a set of taxa X' Z) X and show that a cluster C <Z X 
is a softwired cluster of N' if and only if it is a softwired cluster of N . Since this modification can be 
carried out in polynomial time and the size of N' is polynomial in the size of N , this shows NP-hardness 
of Cluster Containment on tree-sibling, time-consistent, regular networks. Furthermore, we will (in 
polynomial time) modify T to a tree T' on X' and show that N' displays T' if and only if N displays T, 
thus showing NP-hardness of Tree Containment on this restricted class of networks. 

The construction of N' and T' relies on repeatedly applying the following operation. For a vertex v 
of N, define HangLeaves(w) as making the following changes to N, T and X. We add two new taxa x,x' 
to X. Let r be the root of N. We add to N leaves x,x', a new root r', an internal vertex p and 
edges [r' ,r),{r' ,p),{p,x'),{p,x) and {v,x). Let be the root of T. We add to T the leaves x,x' , a 
new root r^, an internal vertex pT and edges (r^, ry), (rip^px), {pt, x) and {pt, x'). See Figure|4] 

We now describe how we transform N into a tree-sibling, time-consistent, regular network by repeated 
applications of HangLeaves. First, we make the network regular by doing the following for each pair u,v 
of distinct vertices of N . 

1. If cl{u) = cl{v), then apply HangLeaves (u) and HangLeaves(u). 

2. If cl{u) D cl{v) but there is no directed path from u to v, then apply HANGLEAVES(t;). 

3. If there exist two distinct directed paths from u to v, one of which is an edge, then subdivide this edge 
with a single vertex w and apply HangLeaves(w). 
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Figure 4: How HangLeaves(i;) modifies N and T. Vertices r and tt are the roots of N and T respectively. 

For any two vertices u,v of A'', operations 1, 2 and 3 make sure that properties (Rl), (R2) and (R3) 
(respectively) of a regular network are satisfied. Furthermore, by the definition of HangLeaves, these 
properties are also satisfied for newly added vertices. It follows that the obtained network is regular. Call 
this network Nr- 

The next step is to make the network time-consistent. For each reticulation-edge {u, v) of A^^, subdivide 
{u,v) by a new vertex w and apply HangLeaves(u;). Let N^tc be the resulting phylogcnetic network. 

Wc claim that Nrtc is time-consistent. Let t : V"(A'^,.) ^ N be a labeling of the vertices of Nr such 
that t{u) < t{v) for each edge {u,v). This is possible because Nr is acyclic. We specify a label t'{v) (the 
time-stamp) for each vertex v of Nrtc as follows. Each vertex of Nrtc that is also a vertex of Av gets the 
same label as in A^^, i-c t'{v) = t(v) for all v E V{Nr). Now consider a rcticulation-edgc (u.v) of A^^- 
Such an edge corresponds to two edges {u,w) and {w,v) of Nrtc- Now label vertex w the same as vertex v, 
i.e. t'{w) = t'{v) = t{v). Observe that {w,v) is a reticulation-edge and now satisfies restriction (TC2) of 
time-consistency. FTirthcrniorc, wc have t' {u) < f/{w) and so {u,w), which is a trcc-cdge, satisfies restriction 
(TCI) of time-consistency. It remains to label the vertices that have been added by HangLeaves. This can 
easily be done in such a way that the restrictions of time-consistency are satisfied. Namely, we give x and p 
the same label as v, give r' any label that's smaller than t'(r) and x' any label that is greater than t'(p) 
(processing vertices in the same order as in which they have been added by applications of HangLeaves). 

Finally, we make the network tree-sibling. For each reticulation r of Nrtc, that has not been added 
by HangLeaves, we do the following. Observe that, as a result of the modifications that made Nrtc 
time-consistent, r has two siblings, both of which are reticulations added by two different applications of 
HangLeaves. Pick any of the two siblings and call it x. Let v be the common parent of x and r. Subdivide 
edge {v,x) by a new vertex w and apply HangLeaves (w). Reticulation r now has a sibling that is a tree- 
vertex, namely w. Moreover, all reticulations x added by applications of HangLeaves have a sibling x' 
that is a tree-vertex. Hence, the resulting network is tree-sibling. Let A''' be this resulting network and T' 
the resulting tree. 

We claim that A^' is not only tree-sibling, but also still regular and time-consistent. To see that A'^' 
is regular, observe that it has been obtained from the regular network Nr by repeatedly subdividing an 
edge by a new vertex w and applying HangLeaves (w). It can easily be checked that a regular network 
remains regular after such a modification. To see that A'^' is also time-consistent, observe that it has been 
constructed from time-consistent network Nrtc by repeatedly subdividing an edge {v, x) by a new vertex w 
and applying HangLeaves(u>). Using that a; is a leaf added by HangLeaves, it can easily be checked 
that a time-consistent network remains time-consistent after such a modification. Thus, A^' is a tree-sibling, 
time-consistent, regular network. 

It remains to show that (i) a cluster C C A' is a softwired cluster of A^' if and only if it is a softwired 
cluster of A^ and (ii) A'^' displays T' if and only if A^ displays T. The crux to showing these things is 
that A''' and T' have been obtained from A'' and T by subdividing edges and applying HangLeaves. By 
this observation, (i) is clear. 

To see (ii), let A^*, T* and X* be the result of a single application of HangLeaves(i;) to N,T,X. We 
claim that A''* displays T* if and only if N displays T. First note that an embedding of T in A'' can easily 
be extended to an embedding of T* in A^* by adding to the embedding all new vertices, a path from r' 
to the root of the embedding and edges (r',p), {p,x'), {p,x). Now consider an embedding of T* in A^*. 
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Since x and x' are siblings in T* , this embedding necessarily contains the newly added vertices and edges 
except for the edge (f,a;). Thus, the restriction of the embedding of T* in TV* to an embedding of T in N* 
does not contain any of the newly added vertices and edges and is thus an embedding of T in N . Thus, N* 
displays T* if and only if N displays T. By recursively applying this argument, it follows that N' displays T' 
if and only if TV displays T. □ 

6. Open Problem 

For a vertex v and a leaf x of some phylogenetic network, we say that z; is a stable ancestor of x if 
all directed paths from the root to x pass through v. A network is said to be reticulation-visible if each 
reticulation is a stable ancestor of some leaf. Recently, it was shown that Cluster Containment is 
polynomial-time solvable for reticulation- visible networks [7j . This class of networks contains, but is more 
general than, the class of tree-child networks. Thus, the tantalizing question remaining open after this work 
is whether Tree Containment is also polynomial-time solvable for reticulation-visible networks. 

Note that Tree Containment cannot simply be solved by checking if each cluster of the input tree T is 
a softwired cluster of the input network N (using an algorithm for Cluster Containment). This approach 
fails because, even if all clusters of T are softwired clusters of TV, and even if N is reticulation- visible, it might 
be that N does not display T, see [5]. Thus, there is no obvious reduction from Cluster Containment 
to Tree Containment or vice versa. 
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